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A Comparison Of Two Methods For Random Labelling of Balls by Vectors of Integers 


Down ZEILBERGER^ 

Greg Kirk[Ki] raised the question of comparing the following two ways for labelling balls. Given r 
pre-determined positive integers Ui, (1 < i < r), and given N balls {N large), consider two ways to 
randomly assign r— component vectors of integers (oi,..., a^.) to them, such that 1 < < n^. We 

will call these vectors ‘labels’. Of course altogether there are 01=1 possible labels. 

First Way: You put all the balls in one big pot. For i = 1 ,... ,r, at the iteration, line up ni 
smaller pots, each with capacity N/m balls, and labeled with labels 1 through m, and, uniformly 
at random, distribute them into these smaller pots. Assign the component of the vector-label 
of each ball, o^, to be the label of the pot in which it was dropped. Having done that, you dump 
all the balls back into the big pot, and go on to the next iteration. 

Second Way: Do the same as above for i = 1 , except that at the end of the first iteration you 
do not dump back the balls into the large ball but proceed as follows. For i = 2,... ,r, assuming 
that the balls have already received their first i — 1 components, leaving the balls in their pots from 
the {i — 1)*^ iteration, you line-up Ui new pots, each with a capacity of N/ui balls, and labeled 
with labels 1 through n^. For each of the n^-i pots from the previous iteration, individually, we 
uniformly at random, distribute their contents into the new pots, each of the new pots getting 
exactly N /{ui-iUi) balls from each of the n^-i pots from the previous, (i — 1)*^ iteration. 

Note that in the First Way, assuming that we can reuse the pots, we need 1 -|- max{ni,... ,nr) 
pots, one of which should have a capacity of N balls, while in the Second Way, we need max{l + 
ni,ni + n2, ■ ■ ■, n^-i + n^) pots. 

The goal is to maximize the ‘equal representation’ of all the possible 01=1 vector-labels. It is 
obvious, with either way, that the probability of a ball to be assigned any given label is n[=inr\ 
and hence that the expected number of balls to be given label u, for each of u G 0^=1 

It is intuitively obvious that in the Second Way the ‘spread’ in the distribution is less than in the 
First Way. In fact, when r = 2 , the Second Way gives a perfect way of equi-distribution. We are 
guaranteed that the number of balls given any particular label (01,02) is exactly N/{nin 2 ). 

Throughout this note we assume that N is divisible by /cm(nin2,0203,..., n^-in^). For any 
statement P, x(P) is 1 or 0 according to whether P is true or false, respectively. 

The way to quantify ‘spread’ is via standard deviation, or its square, the variance. By symmetry 
it is enough to pick any one fixed label, v, say u = (1,..., 1). 
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The ‘random variable’ on a given ‘experiment’ is the ‘number of balls labeled v’. To compute its 
variance, we will use an old trick, described beautifully in section 8.2 of the modern classic [GKP]. 
This trick can also be used to find the average (i.e. first moment), in which case it is even easier 
to use, and higher moments, in which case it is (usually) harder to use. 

Let S denote the set of all possible outcomes of the ‘labelling experiment’. The total number of 
outcomes, in the First Way is 


For each outcome s, let q;(s) be the quantity ‘number of balls that receive the (fixed) label v’. 

Let’s first compute the average of this quantity (even though we know the answer, just as a warm-up 
for the calculation of the variance, that would follow). We have 

N 

^a(s) = EE x{the ball is labelled v) 

ses sesj=i 

N 

= (Gre^) 

j=i seSj 

where the inner sum extends over the set of outcomes, let’s call it Sj, oi s ^ S for which the 
ball was labelled v. By symmetry, this inner sum is independent of j, and equals 


n 


(N-iy. 

{{N/n,) - l)!(iVM)!-^-i 


since at each iteration one of the balls (the j*^) is committed to lend in one of the pots (Pot Vi in 
the iteration.) 

Hence the sum in (Greg) equals; 
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and hence the average is: 
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as expected (sic!). 

Variance and Standard Deviation 

Let’s recall a few elementary facts about variance. The standard deviation is defined to be the square 
root of the variance. Suppose that we have a finite set S, and there is some numerical attribute 
{random variable) X(s) for every element s € S. Then the variance, V{X), is the ‘average of the 
squares of the ‘deviation from the average”, i.e. 


1 /(X) 


Y.sesins)-av)^ 

\S\ 


where ISI is the number of elements of S. 

It is easier to compute the related quantity; 


W{X) 


l^aes \ 2 ) 

\S\ 


Simple algebra shows that: 


V{X) = 2 W{X)+av-av‘^ 
Now we are ready to compute W{a). 


~ ^ ~ ^ ^ balls are both labelled 

' ' ses ^ ^ ' ses i<i<j<N 


1 


[ Number of outcomes with the i*^ and balls both labelled u] {Kirk) 


l<i<j<N 


By symmetry, the summand is independent of (f,j) and is easily seen to be equal to 


n 


{N - 2)1 

{{N/ni)-2)l{N/ni)\^^-^ 
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since, at each of the r iterations, two balls are committed to lend at a predetermined pot (the v\^ 
pot at the iteration.) 

Simple algebra yields 


W{a) = 


N' 


n 

i=l 


- 2 (1 - Uj/N) 
(1-1/A^) 


It follows that 


V (a) = av — av^ + 211^ (a) = 


N 


iV2 




+ iV(iV-l)J]n: 


i=l 


- 2 (1 - Uj/N) 
(1 - 1/N) 


Assuming that N is large, so that 1/N is small, and using the approximation 1/(1 — x) = l+x + 
O(x^), we get the following proposition. 

Proposition 1: The average number of occurrences of any given vector u as a label, in the First 
Way, is N/ 01= Ui, and its variance is; 


Analysis of the Second Way 

Here the total number of outcomes is 

191 = fr [ WK-i)! 1 

(A/m)!"-! |_(A/(ni_ini))!"-y 

Using an analogous argument as before, the number of outcomes with the and balls labelled 
V equals 


(A-2)! -A- _ ((A/n,-i) - 2)! _ [ (A/n,-i)! 

((A/m) - 2)!(A/ni)!"i-i {{N/m-ini) - 2)!(A/(ni_ini))!’^*-i _(A/(ni_ini))!"-“ 

Simple algebra yields that 


W{a) 
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which, as before leads to the following proposition. 

Proposition 2: The average number of occurrences of any given vector v, as a label, in the Second 
Way, is N/ Tii, and its variance is; 


Tw 7i Tw ~ l)w-i) + 0(1) , 

11 ^= 1 "-* j_2 

which is slightly smaller. 
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